Properly Learning Poisson Binomial Distributions in Almost Polynomial Time
نویسندگان
چکیده
We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order n ∈ Z+ is the discrete probability distribution of the sum of n mutually independent Bernoulli random variables. Given Õ(1/ǫ) samples from an unknown PBD P, our algorithm runs in time (1/ǫ) , and outputs a hypothesis PBD that is ǫ-close to P in total variation distance. The sample complexity of our algorithm is known to be nearly-optimal, up to logarithmic factors, as established in previous work [DDS12]. However, the previously best known running time for properly learning PBDs [DDS12, DKS15b] was (1/ǫ), and was essentially obtained by enumeration over an appropriate ǫ-cover. We remark that the running time of this cover-based approach cannot be improved, as any ǫ-cover for the space of PBDs has size (1/ǫ) [DKS15b]. As one of our main contributions, we provide a novel structural characterization of PBDs, showing that any PBD P is ǫ-close to another PBD Q with O(log(1/ǫ)) distinct parameters. More precisely, we prove that, for all ǫ > 0, there exists an explicit collection M of (1/ǫ) log(1/ǫ)) vectors of multiplicities, such that for any PBD P there exists a PBD Q with O(log(1/ǫ)) distinct parameters whose multiplicities are given by some element of M, such that Q is ǫ-close to P. Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate nonproper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work [DKS15b]. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of (1/ǫ) log(1/ǫ)) systems of low-degree polynomial inequalities. We show that each such system can be solved in time (1/ǫ) , which yields the overall running time of our algorithm. Supported by EPSRC grant EP/L021749/1 and a Marie Curie Career Integration grant. Some of this work was performed while visiting the University of Edinburgh. Supported by EPSRC grant EP/L021749/1.
منابع مشابه
Zero inflated Poisson and negative binomial regression models: application in education
Background: The number of failed courses and semesters in students are indicatorsof their performance. These amounts have zero inflated (ZI) distributions. Using ZI Poisson and negative binomial distributions we can model these count data to find the associated factors and estimate the parameters. This study aims at to investigate the important factors related to the educational performance of ...
متن کاملA continuous approximation fitting to the discrete distributions using ODE
The probability density functions fitting to the discrete probability functions has always been needed, and very important. This paper is fitting the continuous curves which are probability density functions to the binomial probability functions, negative binomial geometrics, poisson and hypergeometric. The main key in these fittings is the use of the derivative concept and common differential ...
متن کاملLearning Powers of Poisson Binomial Distributions
We introduce the problem of simultaneously learning all powers of a Poisson Binomial Distribution (PBD). A PBD over {1, . . . , n} is the distribution of a sum X = ∑n i=1 Xi, of n independent Bernoulli 0/1 random variables Xi, where E[Xi] = pi. The k’th power of this distribution, for k in a range {1, . . . ,m}, is the distribution of Pk = ∑n i=1 X (k) i , where each Bernoulli random variable X...
متن کاملOn Learning and Covering Structured Distributions
We explore a number of problems related to learning and covering structured distributions. Hypothesis Selection: We provide an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of N hypotheses containing at least one candidate that is ε-close to an unknown distribution, our algorithm outputs a candidate whi...
متن کاملPromotion time models with time-changing exposure and heterogeneity: application to infectious diseases.
Promotion time models have been recently adapted to the context of infectious diseases to take into account discrete and multiple exposures. However, Poisson distribution of the number of pathogens transmitted at each exposure was a very strong assumption and did not allow for inter-individual heterogeneity. Bernoulli, the negative binomial, and the compound Poisson distributions were proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016